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Abstract 



In this paper, we use a feature model of the semantics of plural deter- 
miners to present an approach to grammar checking for definiteness. Using 
neural network techniques, a semantics - morphological category mapping 
was learned. We then applied a textual encoding technique to the 125 
occurences of the relevant category in a 10 000 word narrative text and 
^ ■ learned a surface - semantics mapping. By applying the learned genera- 

tion function to the newly generated representations, we achieved a correct 
category assignment in many cases (87%). These results are considerably 
better than a direct surface categorization approach (54 %), with a base- 
line (always guessing the dominant category) of 60 %. It is discussed, how 
these results could be used in multilingual NLP applications. 



1 Introduction 

Most uses of the definiteness category in English are grammatically constrained, 
i.e. a substitution of a definite for an indefinite determiner and vice versa leads 
to ungrammatical sentences. In this paper, we use a model of the semantics of 
plural determiners to present an approach to automatic generation of the correct 
determiner. We have identified a set of semantic features for the description of 
relevant meanings of plural definiteness. A small training set (30 sentences) was 
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created according to linguistic criteria, and a functional mapping from the se- 
mantic feature representation to the overt category of indefinite/definite article 
was learned using neural network techniques. We have then provided a surface- 
oriented textual encoding of a 10000 word text corpus. We removed the target 
category in each relevant plural noun occurrence, and automatically generated 
semantic representations from the encoded text. Because texts are semantically 
underdetermined, and the text encoding technique involves a further huge re- 
duction of information content, these representations have some degree of noise. 
However, in generation we can assign the correct category in many cases (87%). 
These results are put into perspective with experiments on surface categorization 
of sentences, i.e. applying learning techniques without the benefit of semantic 
representations. 

The basic methodology in designing a semantic feature representation consists 
in finding a set of semantic dimensions which correspond to the logical distinc- 
tions expressed by a certain grammatical category (cf. [[Kamp and Reyle, 1993 



Link, 1991aj , |Link, 1991 B , |Scheler, 1996| ). In the case of definite determiners, we 



have chosen the dimensions of givenness (i.e. type of anaphoric relation), of quan- 
tification, of type of reference (i.e. predication or denotation), of boundedness (i.e. 
mass reference or individual reference), and of collective agency. The different 
logical forms of the sentences can be represented by a set of sentential operators, 
which are defined in first-order logic. These sentential operators can be used as 
atomic semantic features, which are consequently sufficient in representing the 
logical meaning of a sentence with respect to the chosen semantic dimensions. 
This approach is significantly different from POS or sense-tagging systems such 
as IVarowsky, 1991 , |Schmid, 1994| [Brill, 1993] , |Church, 1988| , |Jelinek, 1985|| . A 
complete list of semantic features and dimensions is given in the appendix. A 
semantic feature set is sufficient for the explanation of a given morphological cat- 
egory if it is possible to generate this category from the corresponding feature 
representation. 

The paper is structured as follows: First, we present an experiment in learning 
a generation function, i.e. a mapping from semantic representations to surface 
categories. Then we explain the principles of textual coding that we have used for 
the semantic feature extraction experiments. Finally, we show how these mapping 
functions can be combined to provide a grammar checker for the definiteness 
category of English, and discuss possible applications in multilingual NLP. 



2 From semantic features to morphological ex- 
pression 

The question that has been investigated by the first experiment is the adequacy 
of a semantic representation for noun phrases which consists of the semantic di- 
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mensions and individual features given in the appendix. In particular, we wanted 
to know how a functional assignment that has been learned by a set of linguis- 
tically chosen examples carries over to instances of the relevant phenomenon in 
real texts. 



2.1 Method 

In order to answer this question, we use a connectionist method of supervised 
learning ("quickprop" [[Fahlman, 1988|| , a variant of the back-propagation algo- 



rithm), as implemented in the SNNS-systcm (cf. [[Zcll and others, 1993|| ). Super- 
vised learning requires to set up a number of training examples, i.e. cases, where 
both input and output of a function are given. From these examples a mapping 
function is created, which generalizes to new patterns of the same kind. 

We created a small training corpus for typical occurrences of bare plurals and 
definite plurals. Grammars written for second language learning often provide a 
good possibility of obtaining a small sample of individual sentences, designed to 
cover all possible uses of a specific category in discourse. 30 example sentences 
with distinct feature representations were adapted from [Thompson and Martinet, 1969 ] 



For these examples, semantic feature representation were created by hand. Neu- 
tral values (*) were also included. Inter-subject agreement of tagging of the data 
was 94 % for two subjects (myself and a student). I.e. there was disagreement 
for 37 tags (out of 625), most of which (22) concerned the category of anaphoric 
relation. 

In principle, there is a better measure of judging the correctness of the feature 
representation, as each of these features refers to the logical interpretation of the 
sentence. This means that the feature representation can serve as an intermediate 
step in creating a cognitive representation expressed in first-order logic, in the 
same way as it has been realized in | Scheler and Schumann, 1995|| for aspectual 



categories. Correctness may then be tested by creating a set of inferences for 
each sentence. However, this work is only experimental at present, and has not 
been performed for definite and indefinite noun phrases yet. Finally, the value of 
the chosen feature set and individual representations becomes apparent, when we 
use these representations in the chosen task of generating correct determiners for 
deliberately truncated (i.e. minus the value for the target category) sentences. 

The symbolic descriptions were translated into binary patterns using 1-of-n 
coding. The assignment of the correct output category consisted in a binary 
decision, namely, definite plural or bare (indefinite) plural. 

We wanted to know how such a set of training examples relates to the pat- 
terns found in real texts. Accordingly, we tested the acquired classification on a 
narrative text, ("Cards on the table" by A. Christie), for which the first 5 chap- 
ters were taken, with a total of 9332 words. Every occurrence of a plural noun 
without a possessive or demonstrative pronoun formed part of the dataset. Mod- 
ification by a possessive pronoun (my friends) , or a demonstrative pronoun (those 



3 



He gives wonderful PARTIES. 

new general predication pieces * 

indef 

The MUSICIANS are practicing a new piece. 
given all reference pieces collective 
def 

They were discussing BOOKS and the theater. 

new general predicative mass * 

indef 



Table 1: Examples from the training set: Sentences, semantic representations, 
and grammatical category 

people) leads to a neutralization of the indefiniteness/definiteness distinction as 
expressed by a determiner. Generating possessive or demonstrative pronouns is 
beyond the goals of this research. As a result, there were 125 instances of definite 
or bare plural nouns. Of these, 75 instances had no determiner (the dominant 
category), and 50 instances had the determiner "the". This provides a baseline 
of guessing at 60%. For the text cases, another set of semantic representations 
was manually created. 

2.2 Results 

The mapping from semantics to grammatical category for the example sentences 
could be learned perfectly, i.e. any semantic representation was assigned its 
correct surface category. 

The learned classifier was then applied to the cases derived from the running 
text. A high percentage of correctness (97 %) could be achieved (cf. Table 

This result is remarkable, as it involves a generalization from linguistically 
selected, 'made-up' examples to real textual occurrences. We may assume that 
the selected set of semantic features describes the relevant semantic dimensions 
of the surface category of definiteness. We also examined the few remaining 
misclassifications (cf. Table [|). They are due to stylistic peculiarities, as in 45 
and 89. Also, two sentences involving numerals were not classified correctly. This 
has probably not been sufficiently covered by the training set. 

We have achieved to learn a generation function from semantic representations 
with remarkably few wrong assignments. The remaining problems with functional 
assignment which are due to stylistic variation are less than we expected, but they 
may go beyond an analysis in terms of semantic-logical features. 
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Learning Generalization 
Table 2: Mapping from semantic representation to output category 



45 INTRODUCTIONS completed, he gravitated naturally to the side of Colonel Race. 

given all predication mass collective 

indef 

89 I held the most beautiful CARDS yesterday. 

new some predication pieces * 

def 

94 He saw four EXPRESSIONS break up - waver. 
implied num predication pieces distributive 
indef 

118 Yes. That's to say, I passed quite near him THREE TIMES. 

implied num predication pieces * 

indef 



Table 3: Misclassifications of the text cases 

3 Semantic feature extraction from Text 

For the goal of cognitive modeling it is interesting to look at the kind of semantic 
representations necessary to explain attested morphological categories and their 
use. For practical purposes, however, semantic representations cannot be manu- 
ally created. They have to be derived from running text by automatic methods. 
This is a goal that is not easy to reach. 

First of all, texts are semantically underdetermined. They do not contain 
all the information present in a speaker's mind that corresponds to a full logical 
representation. Fortunately, these logical representations are often redundant for 
the selection of a grammatical category, so that a noisy representation may be 
sufficient for practical NLP tasks such as text understanding, machine translation 
or grammar checking. Secondly, there remains the problem of how to represent 
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or code a text such as to derive a maximum of semantic information from it, 
but reduce its overall information content, which puts too much burden on any 
current learning technique (in particular the large amount of different lexical 
words) . 

In this paper we wanted to look at the possibility of using a neural network 
learning approach to syntax-semantics mapping for grammar checking, i.e. the 
automatic correction of the definiteness category in a running text. This could 
be a valuable feature in a foreign language editor, it is also a significant part of 
any translation system. 

3.1 Text Encoding 

The text encoding technique should have two important properties: 

• reducing the informational content of a text without losing its essential 
parts for the task at hand 

• using only readily accessible surface information, and limiting pre-processing 
to a minimum 

For the former goal we have provided representations using essentially two syn- 
tactic schemas: 
NP - predicate - NP and 
NP - preposition - NP. 

This is a fairly radical approach in reducing syntactic complexity, and it is 
possible that more detailed representations of syntactic relations would prove an 
asset in semantic feature extraction, (alternative approaches to text encoding 
are contained in [[Bauer, 1995|] and fScheler, 1994j] ). However the advantage of 



this simplistic scheme is that we can use a single fixed-length slot-value repre- 
sentation which fits the local context of most noun phrases. The diversity of 
lexical items has been reduced by substituting each lexical word by high-level 



syntactic-semantic features as derived from WordNet [[Miller and others, 1995 
Functional words and morphology have been reduced to singular /plural and def- 
inite/indefinite distinctions. The full textual encoding scheme looks as follows: 

1. head noun 

2. adjectival/adverbial modifiers 

3. number (singular/plural) 

4. definiteness (indef/def or qu) 

5. predicate or preposition 

6. dependent noun 
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3 VOICES drawled or murmured. 
perceptual_entity * plural qu action * * * * 

4 in aid of the London HOSPITALS. 

event * singular indef prep institution desc_adj plural qu 

5 a Lovely Young Thing with tight poodle CURLS. 

object desc_adj singular indef prep body_part desc_adj plural qu 

7 He wore a moustache with stiff waxed ENDS. 

body_part * singular indef prep part desc_adj plural qu 



Table 4: Examples for surface textual coding 

7. adjectival/adverbial modifiers 

8. number (singular/plural) 

9. defmiteness ( indef /def or qu) 

Values in the slots are lexical classes for head noun, predicate and dependent noun 
(e.g., percept uaLentity, physicaLobject, body_part, person, communication) and 
grammatical classes for modifiers (e.g., adjective, numeral, demonstrative). The 
difficult problem of word sense ambiguity which arises even on the level of primary 
lexical classes, or syntactic-semantic features, was circumvented by assigning the 
most frequent lexical class to a lexical word, measured in terms of its different 
word senses. An easy alternative, namely using all lexical classes in a distributed 
lexical encoding, was not explored here. Some examples are given in Table f|. 
Using 1-of-n coding, we get 53 bits (i.e. 53 features) in 9 slots. We constructed 
another neural network with a 53-20-15 architecture (input-hidden-output layer), 
where 20 hidden units proved to be optimal for the given problem, and tried to 
learn a mapping function from the surface encoding to the semantic layer (15 
features) . 

3.2 Experiments 

In order to investigate the possibilities of grammar checking, we left out the 
definiteness category for the target noun phrase, i.e. substituted indef/ 'dej 'by qu 
for a single noun phrase per sentence. 

We have used crossvalidation by leaving-one-out for the 125 cases. The num- 
ber of examples is still fairly small for surface-semantics mapping, accordingly 
we had to use the strong reduction in information outlined above to have a no- 
ticeable generalization effect. In some cases the resulting textual representations 
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look alike, although there are differences in semantic content, which is a major 
problem for the learning technique used. A learning technique which would be 
less sensitive to conflicting data would probably improve the performance. The 
results for learning and for generalization have been split up for the number of 
errors per pattern. They are given in Table |^. 
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15 
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Table 5: Mapping from encoded surface text to semantic representation 

These results amount in a total average of 2.73 errors per pattern, where 
15 bits had to be set. Our main goal was to generate a set of semantic feature 
representations from sentences without target categories, and test how much noise 
the previously learned generation function can tolerate. 



4 Grammar checking for determiners 

We have observed before that most uses of plural determiners in English are 
grammatically constrained, and in many cases these grammatical constraints are 
evident even from single sentences, without further textual context. 

4.1 Method 

In order to qualify whether a specific use of a determiner is sententially con- 
strained, we have given the list of 125 sentences with the target categories changed 
to three native speakers. We found that speakers agreed on a core of 15 sentences 
which were considered acceptable with the opposite category, and received a total 
of 22 sentences which at least one speaker judged grammatical. This means, in 
103 out of the 125 plural noun occurrences, speakers of English seem to have 
no choice in the use of the determiner. By excluding the 22 sentences with 
'free variation' we bypass problems of textual coreference and anaphora, which 
also play an important role in determiner selection (cf . |[Aone and Bennett, 199GQ . 
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Uonnolly et at., 1995|| for learning approaches to anaphora resolution). Still the 
number of text cases that are narrowly sententially constrained is fairly high. 
For the remaining cases, we took the generalized semantic representations from 
the previous experiment, and tested the performance with the learned generation 
function. 



4.2 Results 

The results were encouraging: In many cases (89, i.e. 87 %) the system made 
the correct binary choice. Note that these are generation data on representations 
that were derived from unseen, only surface-encoded text. When we look at the 
relation between error per pattern and generation performance (cf. Table |6|), a 
clear picture emerges. While the generation function is fault-tolerant to a degree 
(app. < 2 errors), its performance decreases when the number of errors per 
pattern exceeds a certain limit (> 2 errors), up to a point, when we can only 
reproduce chance level (>4 errors). 

correctness of generation 
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50%- 



96% 



_ 87% total 
805 




■42%- 



H h 



< 2 2.7 < 4 >4 no of errors per pattern 



Table 6: Generation from automatically derived semantic representations 

We may also compare this approach to a direct textual categorization ap- 
proach. In this case we used the textual encoding (53 bits) and tried to learn the 
morphological category by direct supervised learning. I.e. instead of the 53-20-15 
net for semantic feature extraction, coupled by a 15-5-2 net for generation, we 
used a single 53-X-2 net (where X was optimal at 10) and repeated the learning 
process for the 103 examples. The results were significantly worse, they did not 
exceed chance level (cf. Table [I]). Including extra hidden layers for automatic 
construction of a "semantic layer", i.e. a 53-20-10-15 net, did not significantly 
improve these results (58/56%) 
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Table 7: Determiner Selection as Classification of Surface Sentences 

5 Applications in Multilingual NLP 

The main task of this paper has been to identify a set of semantic features for 
the description of the defmiteness category in English and apply it to instances 
of plural nouns in a real text. An application to grammar checking has been 
spelled out in the former section. The results lead us to expect that with the 
development of a more sophisticated textual coding, we may have a practical tool 
for checking and correcting defmiteness of English plural nouns. 

The work reported here can also be used for multilingual interpretation and 
generation. This is especially interesting for languages without nominal determin- 
ers, such as Japanese or Russian. In these cases other grammatical information 
that is provided in the surface coding, e.g. Japanese particles with topic/comment 
contrast combining the agentive/givenness dimensions and Japanese word order 
and nominal classifiers, can be used to set the semantic features of the interme- 
diate, interlingual representation (cf. [|Wada, 1994| ). Generation of an English 



determiner can then be handled by the unilingual learned generation function. 

The history of machine translation and text understanding has shown that 
mere surface scanning and textual matching approaches tend to level off as they 
have no capacity for improving performance beyond that of the statistical data 
analysis tool [[Nirenburg et at., 1992| . In contrast, using explicit semantic rep- 
resentations which can be linked to cognitive models provides a basis for both 
human language understanding and practical NLP. Flat surface analysis may per- 
form much better with huge data sets and less information reduction. Still, using 
semantic representations has additional advantages for interactive systems both 
for grammar checking and machine translation. The additional plane of semantic 
representation allows a system to assess the validity of a given decision and frame 
a question in other cases. 

In order for the envisaged system to have real practical use two kinds of addi- 
tions are necessary (in addition to the general task of improving the performance 
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of the classifier): 

• a textual encoding scheme that incorporates a method for coreference res- 
olution to set features in the dimension of anaphoric meaning reliably 

• a confidence measure for the proposed determiner, which would make a 
remaining margin of error tolerable to a user. 

The confidence measure could be composed of a value for the generation com- 
ponent which would depend on the completeness of the semantic representation, 
and a value for the analysis component, which would code the availability of tex- 
tual features and the probability values of the semantic feature assignment (e.g. 
0.9 or 0.6 "collective" etc.). 

With these improvements the system could be a useful tool for anyone who 
uses a foreign language and encounters frequent doubts of grammatical correct- 
ness which no written grammar can answer: * "He answered me with the raised 
eyebrows" is incorrect, but "with raised eyebrows" or "with the eyebrows raised 
in a mocking twist" is fine. 

A Appendix: Semantic dimensions and features 

A.l Generalized quantification 

1. num quantifier with an explicit quantity, e.g. four, five etc. 

2. unique a plural object may also be unique, for instance, the arts, the 
London hospitals This is possible when it has a collective identity (s. below). 

3. some an unspecified quantity, which constitutes a small percentage 

4. most an unspecified quantity, which constitutes a large percentage 

5. all universal quantification, constrained with respect to the discourse set- 
ting 

6. general universal quantification, unconstrained with respect to discourse, 
but pragmatically constrained 

A. 2 Anaphoric relation 

7. given noun phrase with a co-referring antecedent 

8. implied noun phrase which refers to an object implied by a lexical relation 

9. new noun phrase that introduces a new referent 
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3. Reference to Discourse Objects 

10. denotation noun phrase that denotes an object term in discourse (e.g., He 
was walking about in the park) 

11. predication noun phrase that denotes a property in discourse (where a 
property is a one-place relation of a discourse object) (e.g., It's more a park 
than a garden) 

4. Boundedness 

12. mass reference to an unbounded quantity of one kind (e.g., a Lovely Young 
Thing with tight poodle CURLS) 

13. pieces reference to a collection of individuals (e.g., Those dreadful police- 
women in funny HATS who bother people in parks! ) 

5. Agent ive involvement 

14. collective a plural noun referring to set of individuals and a common action 
(e.g., The two girls sang a duet.) 

15. distributive a plural noun referring to a set of objects and individual 

actions (e.g., Four people brought a salad to the party.) 
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